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^ ; Abstract 

Q ' As experimental platforms for quantum information processing continue to mature, characteri 

^^ • zation of the quality of unitary gates that can be applied to their quantum bits (qubits) becomes 

<N . 

essential. Eventually, the quality must be sufficiently high to support arbitrarily long quantum 

(—1 

Q,| computations. Randomized benchmarking already provides a platform-independent method for 

^ ■ assessing the quality of one-qubit rotations. Here we describe an extension of this method to 

a \ 

^ . multi-qubit gates. We provide a platform-independent protocol for evaluating the performance of 

experimental Clifford unitaries, which form the basis of fault-tolerant quantum computing. We im- 
plemented the benchmarking protocol with trapped-ion two-qubit phase gates and one-qubit gates 



(N 
> 

^f^ ■ and found an error per random two-qubit Clifford unitary of 0.162 it 0.008, thus setting the first 

i> : 

ff^ , benchmark for such unitaries. By implementing a second set of sequences with an extra two-qubit 



o 
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% 



phase gate at each step, we extracted an error per phase gate of 0.069 ±0.017. We conducted these 
experiments with movable, sympathetically cooled ions in a multi-zone Paul trap — a system that 
can in principle be scaled to larger numbers of ions. 
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I. INTRODUCTION 

Quantum information processing (QIP) has the potential to solve difficult problems in 
many-body quantum mechanics and mathematics that lack efficient algorithms on classi- 
cal computers. However, achieving useful QIP will require precise control of many qubits 
(two-level quantum systems) and the ability to execute quantum gates (operations that ma- 
nipulate the quantum states of the qubits) with low error per gate. Here, the error per gate 
(EPG) is £ = 1 — -F, where F is the (average) gate fidelity defined as the uniform average 
over pure input states of {ip\p\ip), where p is the (typically mixed) output state and |^) is 
the intended output state (see Ref. [l|). A convincing demonstration of the potential for 



practical 
of 10- 



fault-tolerant QIP should include verification of consistent EPGs below a threshold 



a. 



So far, there has been substantial experimental progress on the basic techniques needed 
for QIP, including the manipulation of small numbers of qubits and the implementation 
of the basic quantum gates that are needed to perform useful algorithms |^. The main 
challenges for QIP experiments that remain are to scale up to larger numbers of qubits and to 
decrease the EPG below the fault-tolerant threshold. Therefore, it is desirable to efficiently 
characterize or benchmark the performance of multi-qubit QIP experiments so as to extract 
the EPG of specific gates and enable comparison between different quantum computing 
platforms. With these goals in mind, we give a benchmarking protocol for arbitrary numbers 
of qubits and show the results from an experimental implementation for two qubits. The 
protocol builds on previous work that used randomized sequences of Clifford gates to measure 
the EPG of one-qubit gates, ffist implemented in Refs. |5|, ISJ. 

Compared to techniques such as process tomography [71, l8[, randomized benchmarking 
offers several key advantages for characterizing EPGs of quantum gates. For example, while 
process tomography offers more complete information about the performance of a gate, it 
does not scale efficiently with the number of qubits in the system, it cannot readily measure 
EPGs below the error probabilities of state preparation and readout, and it does not verify 
performance of a gate in arbitrary computational contexts. In contrast, randomized bench- 
marking can determine EPGs with a number of measurements that scales polynomially with 
the number of qubits p, ISj. Because randomized benchmarking measures an exponential 
decay of fidelity as a function of the number of gates in the sequences, errors in state prepa- 



ration and readout do not limit the minimum EPG that one can measure. Also, randomized 
benchmarking involves gates in the context of long sequences of operations and therefore 
establishes an EPG that takes into consideration a computational context similar to that 
expected in the implementation of lengthy QIP algorithms. Because of these advantages, 
randomized benchmarking following the protocols of Refs. |5|, |6|] has been used to measure 
one-qubit gate errors in a range of systems including trapped ions J5|, |lO| , superconducting 
qubits 111, ll2|, liquid-state NMR 6|, and neutral atoms in an optical lattice 13|. Recently, 
randomized benchmarking was used to measure an EPG of 2.0(2) x 10"^ for one-qubit 
operations with a trapped ion 10 1. 

A number of previous works have described properties of two-qubit gates with various 
measurement techniques. With trapped ions, the fidelity for ^creating a Bell state has been 



measured at 0.83(1) [jj, 0.97(2) [15(, 0.89(3) ll6|, 0.83(3) [l7| and 0.993(1) [18 



Process 



tomo graphy was used to look at single and repeated applications of a two-qubit entangling 



gate [19 



20[. The average fidelities were found to be 0.938(3) for one and 0.882(4) for 



6| and found EPGs 



two gates in Ref. 20[. Two-qubit gates have also been studied in other quantum computing 

platforms including superconducting and photonic qubits (see Ref. ^ and citations therein), 

with measured fidelities ranging from 0.90 to 0.99, disregarding photon loss for photonic 

qubits. In a liquid-state NMR system, a randomized benchmarking technique was used to 

study the errors of sequences of randomized gates on three nuclear spins 

of 0.0047(3). The gates in this experiment were randomly chosen in a platform-dependent 

way from a special-purpose probability distribution where the probability of a two-qubit 

gate (the CNOT) was 1/3. However, gate sets vary by platform, and other experiments 

may choose different probability distributions, for example to improve randomization. As a 

result the error probabilities from Ref. [6| may be difficult to compare to those obtained in 

future experiments. 

The multi-qubit protocol we describe first establishes a platform-independent error per 

operation (EPO) u for Clifford unitaries by applying random sequences of Clifford unitaries 

of varying lengths. Here, a Clifford unitary is any operator in the Clifford group defined 

below. It then determines the EPG of individual gates of our choice by inserting them into 

these sequences. The individual gates to be characterized may depend on the platform. Of 

^ We use the convention that EPGs refer to processes that are intended to implement elementary quantum 

gates, whereas EPOs refer to processes that implement quantum circuits that may scale with the number 

of qubits. In both cases, the gates or circuits need to be specified to interpret reported values. 
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particular interest are implementations of one of the standard universal two-qubit gates such 
as the controlled- not (CNOT), phase gate or square- root of swap. The basic principles of the 
protocol are similar to the theoretical randomized Clifford-based benchmarking sequences 



I9 



described and analyzed in Ref. |9|]. However, like the standard protocol used in one-qubit 
benchmarking experiments so far, the last gate of the sequence does not strictly reverse the 
effects of the previous ones, thus enabling the protocol to detect certain large errors that 
can otherwise masquerade as no errors. In addition, we discuss the practical aspects of 
choosing the random Clifford unitaries in the sequence and extend the protocol to enable 
characterization of specific gates, thus enabling diagnostics that were previously unavailable 
in randomized benchmarking. The proposed protocol is flexible without affecting the ability 
to compare results from unrelated implementations. We note that the theoretical relation- 
ship between the protocol's EPGs and EPOs and the detailed physical noise parameters is 



Is 



not known in general [9|. However, we suggest that subject to simple consistency checks, 
the protocol's reported EPGs and EPOs are nevertheless useful quantities for comparison 
and reflect computationally relevant error behavior. 



For our demonstration with trapped ions, we take advantage of a multi-zone ion trap 21 |. 



The universal two-qubit gate chosen here is a phase gate, G, implemented via a M0lmer- 



S0rensen gate 2^ and acting as the diagonal matrix with diagonal [l,z,z, 1] in the basis 
labeled by | tt))| ti))| it) J ii); where | i) and | t) represent the two eigenstates of 
each qubit, with cr^| f) = +1, etc. Qubit addressing is implemented by separation of the 
ions into different wells, and long sequences of gates are supported by sympathetic cooling 
techniques as required for the approach to scalable ion-trap quantum computing described in 



Refs. 23|, |2J| . The experiment extends the technology demonstrated in 20| by using longer 



sequences of gates and a different implementation of the phase gate 25[ to act directly on a 
magnetic- field-insensitive transition in ^Be"*". 

From sequences of up to seven Clifford unitaries, each requiring an average of 1.5 phase 
gates, we deduced an EPO of 0.162(8) for the Clifford unitaries and an EPG of 0.069(17) for 
the phase gates. Although we implemented relatively long sequences, the experiment does 
not yet demonstrate stationary behavior because ion loss prevented routine implementation 
of longer sequences. There are also indications that the errors increased with sequence length 
by two to three standard deviations, with the EPOs ranging from 0.144(11) to 0.185(20) 
and the EPGs from 0.048(26) to 0.120(44) as the sequences lengthened. Our EPO sets the 



first benchmark for random two-qubit Clifford unitaries. Tlie EPG sliows no improvement 



over tlie gates used in 20|, but applies to gates used in computationally relevant contexts 
in longer sequences. 

The paper is structured as follows: We first describe the protocol and its main features for 
two qubits. We then discuss the experimental implementation of the protocol and show the 
experimental results. The data- analysis methods are detailed next, followed by a discussion 
of necessary consistency checks and estimates of physical sources of error. We finally define 
the protocol for arbitrary numbers of qubits, and make recommendations for how to apply 
and compare it when qubit numbers vary. 

II. BENCHMARKING PROTOCOL 

Clifford unitaries are fundamental to most error-correcting procedures envisioned for 



quantum computing (see, for example, Ref. 26|) and thus serve as a foundation on which 



universal fault-tolerant quantum computing is built. As a result, a large fraction of the fun- 
damental processes in proposed quantum computing architectures involve Clifford unitaries. 
The three main features of the group of Clifford unitaries that make it useful for our pur- 
poses are that its members have compact representations that can be efficiently converted to 
circuits of elementary quantum gates, outcomes of standard measurements of sequences of 
Clifford unitaries can be efficiently predicted by classical computation, and the group is suf- 
ficiently rich that error operators can be perfectly depolarized. These features are explained 
in context below. 

For a system of n qubits, Clifford unitaries can be constructed by combining one-qubit 
±1 rotations, defined as i?„(±7r/2) = e^**'^" with u = x,y, about the ct and y axes, and 
two-qubit CNOT gates. Alternatively, the Clifford unitaries are the members of the Clifford 
group, which is defined as the set of unitaries U with the property that for every Pauli 
operator P, UPW is a signed product of Pauli operators. We consider two gates or unitaries 
that differ only by a global phase to be identical. 

The randomized benchmarking protocol is an extension of "Clifford twirling" [1|. In the 
simplest instance of Clifford twirling, an arbitrary quantum process V is sandwiched between 
a random Clifford unitary C picked from a uniform distribution and its inverse C^. Alter- 
natively, we can think of Clifford twirling as averaging the process CJVCi over all elements 



Ci in the set C, of Clifford unitaries. The key property of Chfford twirhng is that this new 
process behaves hke one that uniformly depolarizes with some probability. In other words, a 
single parameter, the probability that a pure input state is mapped to an orthogonal state, 
characterizes the new, average process [l| . When P is a noisy implementation of the identity 
gate, such as a long self-reversing sequence of gates, we use this parameter as the definition 
of the average error of V. Clifford twirling can be generalized to learn the average error of 
an arbitrary process V intended to implement a specific Clifford unitary U: The inverting 
Clifford unitary C^ that is applied after the process is modified to an implementation of 
the unitary f/C^f/^. With this modification, the net effect is U if there are no errors in 
V. Because Clifford unitaries form a group, f/C^t/^ is also Clifford. The implementation 
of f/C^f/^ should not rely on V to provide f/, and it is better not to decompose it into a 
composition of three processes according to the given expression. This can be satisfied by 
first evaluating the unitary operator UC'^U'^ as an element of the Clifford group and then 
implementing it by an efficient procedures for translating Clifford unitaries into quantum 



circuits 



m 
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If V contains errors, then the net process applies the unitary U followed 
by a uniformly depolarizing error whose parameter defines the average error of V. 

Randomized benchmarking extends the idea of Clifford twirling from the simple three- 
step sequence described above to randomized sequences of Clifford unitaries with errors. 
These sequences consist of steps where each step implements a randomly- chosen Clifford 
unitary and may have errors. Each step in the sequence simultaneously acts as a process 
that undergoes twirling and contributes to the twirling of errors in the other steps. Under 
optimistic assumptions described later, each step effectively behaves as an ideal unitary 
followed by a depolarizing process. The first goal is to establish the average error per step, 
which can then be reported as the EPO for Clifford unitaries. 

The method for implementing the Clifford unitaries making up the steps in the ran- 
domized sequences is up to the experimenter. Here we describe our approach. To improve 
stability of the twirling process and take advantage of the typically lower errors of one-qubit 
gates, each step is composed of a Clifford unitary preceded by a Pauli unitary, where the two 
parts are chosen so that together they implement a uniformly random Clifford unitary. The 
choice of Pauli unitary on each qubit is random and independent from the choice of Clifford 
unitary; each Pauli unitary involves applying either no pulse or a major-axis vr-pulse. There 
are eight possible such pulses acting as e='=*'^"''/^, where cr^ is a Pauli matrix or the identity: 



Ou e {1, (Ta;., CTy, cr^}. The sign in the exponent affects only the global phase and results in 
two choices for each possible matrix in the exponent. We keep the sign because in many 
cases including ours, the change in sign can involve a physically different device setting, such 
as the phase in a pulse generator that determines the orientation of the fields that mediate 
the pulse. Each qubit's vr pulse is chosen uniformly at random from the above eight pulses. 
Because of this Pauli randomization procedure, it suffices to choose the unitary in the 
second part of the step uniformly at random from the Clifford group modulo the group of 
Pauli products. For this we can take advantage of the fact that the group of Pauli products 
is a normal subgroup of the Clifford group, and the quotient group (of Clifford unitaries 
modulo Pauli products) has a representation in terms of binary symplectic matrices M of 
dimension In x 2n such that MSM^ = S modulo 2, where S* is a 2 x 2 block matrix with 
n X n blocks whose diagonal blocks are zero and whose off-diagonal blocks are the identity, 
see, for example Ref. 29|, |30|. The terminology is based on Ref. 3l|- For two qubits, there 



are 720 such matrices M. Uniformly and randomly choosing from among these matrices is 
computationally straightforward and efficient. 

Determining an implementation of the Clifford unitary described by such a matrix in 
terms of the elementary gates available in a particular experiment is more challenging. 
There are efficient algorithms that translate an arbitrary symplectic binary matrix into 



order of n^/log(?7,) elementary one- and two-qubit gates 27|, |28|], each of which can then be 
mapped into experimentally available operations. However, there is strong motivation to 
obtain shorter implementations, as this is a sure way to improve the measured EPO. While 
it is unlikely that optimal implementations can be readily obtained for arbitrary numbers 
of qubits, we used the following strategy for two qubits optimized for our demonstration: 
By exhaustively listing compact circuits of one-qubit Clifford gates and phase gates G, we 
determined for each of the 720 symplectic binary matrices a circuit with the minimum 
number of phase gates implementing the corresponding Clifford unitary (modulo a Pauli 
product). On average, 1.5 phase gates were required. These circuits were then translated 
into appropriate actions in our ion-trap platform. 

Given the method for generating the random unitaries for one step, a benchmarking 
experiment is configured by first deciding on a set of lengths li < . . . < Ik that determine 
the numbers of steps in sequences to be generated. The EPO is determined by fitting an 
exponential decay to fidelities ( 1 — E', where E is the error probability) measured for each 
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FIG. 1. Example of Clifford unitary generation. First a random binary syniplectic 4x4 
matrix Mc is generated. In general, the size of such matrices is twice the number n of qubits. They 
efficiently encode a Clifford unitary whose size is 2" in general. The second step is to convert Mq 
into a sequence of elementary gates that enacts the corresponding Clifford unitary and is suitable 
for implementation in the ion trap platform. We minimize the number of G = diag(l, i, i, 1) gates 
in such sequences. The sequence found for the example Mc shown is given on the right. 

sequence length. The choice of lengths therefore contributes to how well the EPO can be 
extracted. In particular, there should be enough lengths for stable curve fitting, and lengths 
much greater than the inverse EPO contribute little additional information. For each length 
/, many sequences of I random steps are produced. At the end of each such sequence, a 
randomized measurement step is added. This step consists of a Pauli randomization followed 
by a Clifford unitary that inverts the / preceding Clifford steps. The final Clifford unitary 
is chosen independently of the Pauli randomization. This ensures that in the absence of 
errors, the final state is again in the computational basis but randomized. Which basis 
element it should be in can be computed by use of standard efficient methods for simulating 
sequences of Clifford unitaries 



The sequence can then be experimentally implemented 
after preparation of each qubit in the —1 eigenstate of az and followed by a measurement 
in the o"^ basis of each qubit. 

One should implement sufficiently many runs of each sequence to have good signal-to- 
noise on the inferred probabilities of getting a correct or incorrect answer in the measurement 
for this particular sequence. The process of generating and implementing random sequences 
at each length is repeated in order to ensure randomization of the unitaries and their asso- 
ciated implementation errors. For our two-qubit benchmarking demonstration, we used the 
set of lengths {1,2,3,4,5,6} and generated between 15 and 55 random sequences of each 
length. The variation in numbers of sequences is explained below. We implemented 100 
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runs for each sequence to determine its probability of error q 

The experimental runs yield an average probability of error E{1) for each length /, where 
the average is over the sequences of this length and their runs. To analyze E{1) we start 
by making the simple assumption that each step's error behaves as a completely depolar- 
izing channel (see, for example, Ref. J30|, pg. 378) characterized by error probability Eg 
independent of its gates or position in the sequence. Similarly, we assume an overall error 
probability Em for state preparation, the last inverting gate and its Pauli randomization, and 
measurement. Then the mean of E{1) with respect to repetitions of the experiment satisfies 

^(0 = ^(l-(l-4.„/3)(l-45,/3)') (1) 

for two qubits. The case of more than two qubits is discussed in Sec. IVIIII Note that 
Ae/S is the depolarization probability if the error probability is e. The probability of not 
depolarizing the state in a sequence is the product of the probabilities of not depolarizing 
in each step. To derive the equation, note that in terms of E{1), the probability of not 
depolarizing is 1 — |-E(/). 

Assuming that the experimental observations are consistent with the simple exponential 
behavior suggested by this formula, we use it as the defining formula for the EPO Eg of a 
random Clifford unitary, regardless of the actual behavior of errors. In particular, in the 
context of these benchmarks, we associate the EPO with the decay parameter of the error 
probabilities E{1) rather than a particular exact parameter of the underlying physical errors. 
This supports the platform-independent use of randomized benchmarking. If the simple 
depolarizing assumption does not hold, then E{1) may exhibit non-exponential and transient 
behaviors; see the discussion below. However, the twirling effected by the randomization is 
intended to induce behavior that matches the one implied by this assumption. 

To isolate the EPG of the phase gate G (or any other gate) we generate a second set of 
sequences by inserting G after each random Clifford unitary. The final inverting Clifford 
unitary is chosen in the same way as before, taking into account the effect of the additional G 
gates to ensure that the final state is a predictable computational basis state in the absence 
of errors. Under the same idealizing assumptions that yield Eq. ([1]), the average probability 
of error E'{1) measured for the implementation of this experiment satisfies Eq. [H but with a 
^ Due to an undetermined problem in the control code, for approximately 1/20 of experiments, the record 

for one run is missing. Thus, for experiments with nominally 100 runs, occasionally only 99 runs were 

recorded. 



different value of Eg due to the additional operation in each step. In an ideal experiment Em 
should be the same, but the model must take into consideration that it might have changed, 
for example due to experimental drifts. Explicitly, 

E'{l) = l{l-{l-^e'j3){l-4E'^/3y), (2) 

where e' is the probability of error of a step consisting of a random Clifford gate and G. 
In this context, the assumptions on the error behavior of G could be relaxed from simple 
depolarization. We can isolate the EPG Eq of G by solving the identity 

(1 - 44/3) = (1 - 4£,/3)(l - 45^/3), (3) 

which gives 

It is helpful to run randomized benchmarks on subsets of the available qubits so that 
results can be compared to other experimental platforms that have different numbers of 
available computational qubits and for investigating differences in behavior that depend on 
(for example) geometrical relationships between qubits. If possible, these benchmarks should 
be run in parallel on disjoint subsets. For these reasons, we checked the performance of the 
one-qubit gates in parallel on the two ion qubits. Because of the pre-existing benchmarks, we 
did not implement the above protocol for each qubit, but used a one-qubit benchmarking 
protocol similar to that of Ref. 5|. Briefly, the length of a sequence is the number of 
steps that consist of a Pauli gate (7r-pulse) followed by a Clifford gate (^-pulse) on each 
qubit. Each step can be thought of as implementing a random computational gate. The 
gate sequence is followed by a Pauli gate and Clifford gate chosen to yield a predictable 
measurement outcome in the Z basis for each qubit. The Pauli gates are chosen with equal 
probability to be rotations about the x,y or z axis or the identity. The Clifford gates 
are chosen with equal probability from the following five options: -Ra;(±7r/2), i?y(±7r/2), 
or the identity. When many subsequent gates are composed together, this distribution of 
Clifford gates demonstrates favorable convergence to a uniformly random Clifford unitary 
in comparison with the distribution in Ref. J5[ . The introduction of identity gates into the 
Clifford gate step reduces the average expected number of ^-pulses in that step from 1 to 
0.8. 
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III. EXPERIMENTAL IMPLEMENTATION 



32 



We perform the benchmarking demonstration with the ion-trap system described in 20|, 



33| using updated techniques. This system includes most of the features of the scalable 



quantum computing architecture of |23|, |2J]. We trap four ions in a six-zone linear Paul trap: 
two ^Be"*" ions that serve as the qubits, and two ^^Mg"*" ions that are used for sympathetically 
recooling the qubit ions during the sequences. The ions form a linear chain along the axis of 
the trap, which is the axis of weakest confinement. The two-qubit phase gates are performed 
with all four ions in the same trap zone, in the order ^Be"*"— ^^Mg"*"— ^^Mg+— ^Be+ (Fig- 
12^ bottom left). Individual addressing of the ions for one-qubit rotations is achieved by 
separating the ions into two trap zones 0.37mm apart with a single ^Be+-^^Mg+ pair in each 
zone (Fig. |2^ below electrodes). 

The qubit states are the \F = l,mF = 0) = | t) and |2, 1) = | |) hyperfine states of 
^Be"*", where F and nip are the total angular momentum quantum numbers. The energy 
difference between these states is first-order insensitive to magnetic-field fluctuations at the 



applied field of 0.011964 T 20|, l33|. At the beginning of each experiment we prepare the 
^Be"*" ions in the | H) state. At the end of each experiment, we detect the qubit states by 
transfering the | 4) a-iid | t) states to the |2,2) and |1,— 1) states, respectively and then 
apply a cr+-polarized laser beam that is directed to trap zone A (Fig. |2]) and resonant with 
the Si/2 |2,2) ^ P3/2 |3,3) cycling transition. The presence (absence) of ion fluorescence 
observed with a photomultiplier tube indicates the | l) (| t)) state. For a single ^Be"*" ion, 
the average number of photons collected in 250 /is is typically 30 for the |2, 2) state and 1.5 
for the |1, —1) (limited by stray light). This allows us to analyze each detection individually 
with a threshold detection level of around 11 counts. To measure both qubits we first detect 
the state of the left qubit while the other is held in trap zone B (Fig. [2]). Then we optically 
pump the left qubit to the |2,2) state and transfer it into the "dark" |1, —1) state. Finally, 
we bring both qubit ions into trap zone A and apply the same procedure to detect the state 
of the right qubit. 

One-qubit rotations about a vector in the x — y plane are implemented with the "co- 
carrier" laser beams (Fig. [2]) that cause stimulated-Raman | l) f-^ | t) transitions on the 
^Be"*" ions after they are separated and held in different trap zones 20|, |3^, |33| . Specifically, 



the carrier transitions perform R{9,(p) = e ^-i"'*^ where a^ = cos{(f))ax + sin(0)o"y, and 
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FIG. 2. Experimental setup, (a) Schematic showing the two trapping zones, ion positions, and 
laser beam paths used (not to scale). Ions are trapped in trap zones A and B. An electrode X 



m, 
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331 ]. To perform one-qubit 



between these trap zones is used to separate and recombine ions 
rotations the ions are separated such that a ^Be~^— ^^Mg"^ pair is trapped in each zone (depicted 
directly under the electrodes). To perform the entangling gate, all four ions are combined in 
trap zone A (bottom left). The beam waists are approximately 25 /im in the vertical direction and 
30 /xm along the trap axial direction, which is large compared to the extent of the two-ion and four- 
ion crystals (6/im and 11 /.im). (b) Laser beam configurations and frequencies used for different 
operations, (i) Two co-propagating beams induce Raman carrier transitions in either trap zone 
used for single-qubit gates. An acousto-optic deflector is used to direct the co-carrier to either trap 
zone, (ii) The 90° beam is directed to trap zone A at 90° with respect to the co-carrier beam paths 
such that the wave-vector difference is along the trap axis. These beams induce carrier transitions 
used as part of the phase gate G. (iii) Three beams induce the M0lmer-S0rensen gate used as part 
of G. Beams with different frequencies depicted as slightly displaced arrows are actually overlapped 

in the experiment. Details and frequency definitions are provided in the text. 
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depends on the phase difference of the laser beams at the position of the ion(s). For co- 
carrier transitions (on-resonance qubit transitions), two co-propagating laser beams have 
frequencies fi and f^ + /o (see Fig. [2)d (i) ), where //, is the principal laser frequency at 
957.132 THz, which is approximately 70 GHz below the 5*1/2 to P1/2 transition frequency 
and /o = 1.207353 GHz is the qubit transition frequency. The co-carrier laser beam position 
along the trap axial direction is controlled with an acousto-optic deflector that allows the 
beam to address ions in either trap zone. The pulse duration for a single-qubit rotation by 
vr and ^, using the co-carrier beam configuration shown in Fig. |2)d (i), was approximately 9 
and 4.5 /is, respectively. One-qubit a^ gates {Rz{(t)) = e'''^''-) are implemented in software 
by shifting the RF phase of all future rotations for that qubit by —0. Identity gates are 
implemented with a wait time equal to 4 /is. 

One laser beam propagating along the co-carrier beam path and a pair of laser beams 
propagating along the 90° beara path (Fig. [2]) are used to implement phase gates G. In con- 



trast to the experiments in 20|, |32|, |33| , which implemented two-qubit G phase gates directly 



on hyperfine states, we use one-qubit rotations and a M0lmer-S0rensen (MS) gate [2^ to 
implement G |25l]. In the previous experiments, implementation of the phase gate required 
that the qubit ions' states be transferred from the qubit manifold, where the qubit frequency 



is first-order independent of magnetic-field fluctuations, to other hyperfine states 20|, l33 | 



However, the MS gate can be performed directly on the qubit states. To implement G we 

-i^r (1) (2) 

surround a MS gate pulse, Ums = e * •* -^ , with carrier 7r/2-pulses on both ions by use 
of two laser beams as shown in Fig. [2)d (ii) and (iii). The resulting three pulse sequence is 
g 4^ .^+ </.+ ^f/^^^g 4V </._ ip^' = Qq «4, where we use (f)± = </> ± | and where the overall 
phase factor after G has no physical consequence in this setting. The advantage of using G 
as our elementary two-qubit gate rather than the MS gate is that this three-pulse sequence 
is insensitive to slow changes in the optical path-length difference of the non-copropagating 



beams, which cause to change [25|. The duration of the Ums pulse is 20 /is and the 
duration of each carrier 7r/2 pulse using the beam configuration shown in Fig. |2]d (ii), is 
approximately 1.5 /is. Due to wait periods between pulses that are necessary to stabilize the 
feedback loops that control the laser pulse amplitudes and phases, the three pulse sequence 
requires 110 /is to complete. Before performing each G gate we sympathetically laser-cool 
the four-ion crystal, first using Doppler and then Raman sideband cooling of the ^^Mg"*" 



ions 



20 



32 



33|. This ensures that each time we implement G, the motional modes along 
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the axial direction are cooled to near the ground state. The cooling light interacts only with 
24]\/[g+ anci thus preserves the qubit state coherences. 

In more detail, the MS gate requires the simultaneous application of detuned blue and 
red sidebands. To achieve this, we overlap three laser beams with different frequencies in 
trap zone A (Fig. [2]d (iii) ). One laser beam propagates along the co-carrier beam path with 
frequency fi. The other two beams co-propagate in the 90° beam path at frequencies of 
/l + /o ± {fz + S), where fz is the frequency of a motional mode and 6 <ti fz is a. detuning. 
The two laser beams are derived from a single beam that is split and passed through different 
double-pass acousto-optic deflectors such that they end up with a frequency difference of 
2(/z + S). The split beams are then recombined on a 50-50 beam splitter with one port 
directed to the ions and the other going to a photo-detector that is used to measure and 



stabilize the phase of the beat note, as required to realize Ums 25|. To implement G, 



we simultaneously address the two highest-frequency axial motional modes for the four-ion 



crystal at fz = 5.487 MHz and /^ = 5.739 MHz [32|. The detuning 6 must be chosen such 
that the detuning from one mode is an integer multiple of the detuning from the other in 
order to fully disentangle both motional states from the qubit states at the end of the gate. 
Experimentally, we found a detuning of 5 = 50 kHz above fz was optimal given our laser 
beam intensities, which implies that the MS gate was implemented with one phase-space 



loop on the fz mode and four loops on the /^ mode 22|. In Fig. [3] we plot the observed 
fraction of both ions in the | l) state (red squares), both ions in the | f) state (blue circles), 
and one ion in each state (green triangles) as a function of the duration of the red and 
blue sideband pulses applied to an initial state of | 4,)i| 4)2. The MS gate is completed in 
approximately 20 /US (|). 

For the Clifford and phase-gate benchmarks, we generated random sequences of lengths 
1, 2, 3, 4, 5, 6. The respective number of sequences implemented was 45, 55, 53, 39, 28, 15 for 
the Clifford benchmark, and 46, 54, 53, 38, 28, 15 for the phase-gate benchmark, in order of 
sequence length. Each time we performed a sequence for the Clifford benchmark we then 
immediately performed the corresponding phase-gate benchmark sequence. Each sequence 
was implemented 100 times. From the measurement outcomes, we determined the fraction 
of measurements that matched the prediction. The data shown in Fig.Hlwas obtained in four 
successive sets of experiments on the same day. During and between the sets we periodically 
recalibrated the magnetic field and the laser frequencies needed for sympathetic cooling of 
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FIG. 3. M0lmer-S0rensen gate. We simultaneously apply detuned red and blue sideband MS- 

gate beams to an initial | 4)| 4^) state for varying durations and observe the frequency with which 
we find both ions in the | \.) state (red squares), both ions in the | '|") state (blue circles), or one 
ion in each state (green triangles). From these curves, we can determine the gate time (-^r) for 
the MS gate. Here it is approximately 20 /is, at which point the qubit states are entangled and 
ideally in the state 4=(| 4,)| i) + e"*^! t)| t))) where (p depends on the phases of laser beams at the 
ions' position and can vary from experiment to experiment (see text). The solid lines show the 
theoretical results for an ideal gate. To perform a phase gate we surround the MS-gate pulse with 
two ^-pulses by use of the laser beams as depicted in Fig. [2)3. (ii). The points and their error bars 
were determined by photon-count histogram fitting from 250 runs. 



^^Mg but not the other pulse parameters. Within each set, the sequences were randomized 
with respect to length. However, the first (second, third, fourth) set involved sequences of 
lengths 1 to 3 (to 4, 5, 6, respectively). In particular, sequences of length 6 were run only in 
the last set of experiments, which is why there are fewer sequences of length 6 contributing 
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to the data. During the experimental runs for these sets, we observed one ion-loss event. We 
did not implement longer sequences because ion-loss events became a problem for lengths 
greater than 6. 



In our implementation, a Clifford unitary took 4.5 ms on average. For the sequences 
with an extra G gate inserted after each step, 7.5 ms per step was typical. The most time- 
consuming elements of the sequence implementations were the sympathetic recooling of 
the ions after each recombination of the ions into trap zone A, followed by the separation 
and recombination processes. Each sequence began with approximately 10 ms for state 
preparation and laser lock stabilization. Thus, a sequence of length 6 with an extra G 
inserted at each step lasted approximately 55 ms. Longer sequences resulted in an accelerated 
rate of ion loss events (on the order of a loss event per minute), which can likely be attributed 
to a decreased probability of recovery from background gas collisions that can occur at any 
point during the sequences. Before running each sequence, two warmup sequences with 
100 experiments each were run to make sure the experiment was in a steady state; the 
results of these experiments were not recorded. Switching from one sequence to the next 
required 3 s to 4 s of computer time to reprogram the control hardware. In total, all of 
the Clifford benchmarks, including the sequences used to benchmark G, were completed in 
approximately 1 hour and 45 minutes, which also includes the time durations needed for 
periodic recalibrations of the magnetic field as well as the time period to reload a set of ions 
following the only ion-loss event. 



The parallel one-qubit benchmark whose results are shown in Fig. [5] was executed in one 
set, after all of the two-qubit benchmarks and following a recalibration of the one-qubit 
gates. The number of sequences implemented was 15, 13, 6, 13, 12, 14 for sequence lengths 
of 2,3,4,6,8,12, respectively. We ran each sequence 100 times, as before. In order to 
approximately replicate the conditions of the experiment for the two-qubit benchmark, in 
each step, the ions were recombined into a single trap zone, recooled and then held for 
approximately the same duration required to execute G before being separated again for the 
next sequence step. 
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IV. EXPERIMENTAL RESULTS 

The red data points and curve in Fig. H] show the results from the experimental Clifford 
gate benchmark and their match to an exponential decay. The match gives a Clifford unitary 
EPO of eg = 0.162(8). 

The blue data points and curve in Fig. |4] show the results from the G benchmark. Curve- 
fitting and solving the above equations for e^ give an EPG of e^ = 0.069(17). 

We determined the errors per step on each qubit independently with the parallel one- 
qubit benchmarks explained above. The results from the benchmarks are shown in Fig. 
The inferred one-qubit errors per step are 0.010(2) and 0.007(2) for the respective qubits. 
Using the assumption that laser pulses dominate the error per step, these results can be 
compared to the protocol of Ref. |5| through multiplication by the ratio 2 : 1.8 of ^-pulses 
per step in the two protocols. 



V. DATA ANALYSIS METHODS 

In the limit of very large numbers of sequences for each length, we can use a simple, 
nonlinear, weighted-least-squares fit of Eq. ([1]) to the fidelity curves as a function of length. 
The weights are determined by the standard error of the mean for the fidelity at each length. 
Note that some non-linear least-squares fitting functions compute the error in the inferred 
parameters from the fitting error and ignore the scale of the errors implied by the weights 
given to the individual points. Because we already have a good estimate of the standard 
errors of these means, a better estimate of the error in the inferred parameters can be 
obtained by direct propagation of errors, particularly when there are few points. 

For smaller numbers of sequences for each length, we must consider that we know little 
about the distribution of the fidelities for different random sequences of a given length. 
This distribution is affected not only by the differences in actual pulses applied, but also by 
factors such as the amount of coherence in error (see below). However, the estimate of a 
given sequence's fidelity from the 100 experimental runs is binomially distributed. Thus we 



used a partially parametric bootstrap [Mj procedure to determine a standard error for the 
parameters inferred by fitting. Let ni be the number of different sequences of length / used in 
the experiment. Denote the experimentally measured fidelity of the j'th sequence of length I 
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FIG. 4. Randomized benchmarking of two-qubit gates. The red circles show one minus the 

average probabihty of measuring an error at the end of sequences of random Chfford unitaries E{1) 

as a function of the sequence length /. By fitting the data to the expression in Eq. ([1]) (red line), 

we find an error per random Clifford unitary £g = 0.162(8). The preparation/measurement error, 

£m, is 0.086(22) (recall that measurement error includes the error for an additional inverting gate 

before detection). Blue squares show the results for running random sequences with an additional 

G inserted after each step. Fitting this data to Eq. ([2]) yields an error of e^ = 0.069(17). In this 

case the preparation/measurement error, e'^, is 0.132(26). The error bars in the plot represent 

the standard deviation of the mean of the sequences' frequency of correct measurement outcome. 

Error bars for inferred parameters are based on bootstrap resampling; see the text. 
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FIG. 5. Randomized benchmarking of one-qubit gates. Red circles and blue squares show 
one minus the average probability of error for each qubit independently. The solid lines are the 
best fits of the data to E{1) = i — i(l — 2em)(l — 2eg)' where / is the sequence length, Sm is related 
to the state preparation and readout fidelities of the two qubits, and Eg is the error per step in the 
sequence. We find the errors per step to be 0.010(2) and 0.007(2), respectively. 

as F(/, j), which is the fraction of times the correct result was obtained during the 100 runs 
of the j'th sequence. We generated artificial data for each bootstrap resample as follows. For 
each length /, we constructed Fr{l, k),k = 1, ... ,ni by letting Fr{l, k) be a random element 
of the sequence of fidelities F{l,j),j = 1, . . . ,ni, picked independently (with replacement, 
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that is, the same element can be picked multiple times) for different k. For each Fr{l,k), 
we generated a random F'{l,k) according to a binomial distribution, so that F'{l,k) was 
the fraction of I's in 100 random instances of a 0/1 variable where the probability of 1 was 
Fr{l, k). We then averaged the F'{1, k) for each length / and computed the fit in the same 
way as it was computed for the real data to obtain the inferred parameters for this resample. 
This resampling procedure was repeated 1000 times, yielding 1000 resampled values for each 
of the parameters. The standard errors in the reported parameters are determined as the 
square-root of the variances of the corresponding resampled parameters. The values of the 
parameters are still the ones from the fit of the original experimental data, as this is the less 
biased estimate and requires no bootstrapping. 

VI. CONSISTENCY CHECKS 

Although the relationship between EPOs and physical errors in gates is not known in 
general, specific benchmarking protocols provide well-defined EPOs that can be compared 
across platforms. However, for an implementation of the benchmark to be convincing, there 
are several assumed or expected properties that can be checked. These include the following: 
We can determine whether or not the fidelity curves are consistent with a simple exponential 
as a function of sequence length, and if not, analyze the deviations. Given that we know the 
implementations of the Clifford unitaries, we can compare the EPO for a Clifford unitary 
with that inferred from the EPG for a phase gate and the one-qubit benchmark. 

First we consider the exponential fits shown in Fig. |H The x^ values (four degrees of 
freedom) for the two curves are 9.28 and 9.48, respectively. The higher one corresponds 
to a p- value of 0.0501, approximately the conventional boundary for significance. There is 
other evidence that the exponential model may not be a good fit. First, the two prepara- 
tion/measurement errors are expected to be the same, but the fits seem to suggest otherwise, 
although the statistical significance is not strong. Second, both sets of data seem to dip below 
the fit near the end. Together these observations suggest an increased EPO for later Clifford 
unitaries. Indeed, dropping the first points from the analysis suggests higher EPOs. For 
example, the fits for the last four and three sequence lengths have EPOs of Eg = 0.185(20) 
and Eg = 0.237(25), respectively. The corresponding EPGs are 0.120(44) and 0.090(100). 
Note that the second values are from a two-parameter fit to three points, which reduces 
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their significance. The fits to the first four sequence lengths give an EPO of 0.144(11) and 
an EPG of 0.071(20). The higher EPOs for longer sequences could be related to the fact 
that unlike the shorter ones, they were run only in the last or last two sets of experiments, 
without recalibrating pulse parameters except as noted above. We attempted to confirm this 
hypothesis by analyzing the results for sequence lengths one to three separately for each set. 
The results of this analysis are consistent with a drop but have insufficient signal-to-noise 
to be conclusive. 

In principle, we would like the platform to have the property that errors reach stationary 
behavior soon after state preparation, and the benchmark's reported EPO should reflect the 
stationary error. As noted above, we were not able to implement long enough or sufficiently 
many sequences to clearly observe stationary behavior, or to determine the extent to which 
the behavior is nonstationary. The EPO and EPG reported in the abstract are determined 
from all six lengths tested — given the "early" and "late" values above, we believe that they 
are a good representation of mid-length behavior of gate errors. Our inability to consistently 
run sequences of length longer than six prevents any claims of stationary behavior. 

Other issues with the exponential fits that can arise include the possibility that the curves 
are a mixture of exponentials, as would be expected if the EPOs change slowly compared to 
the time required to run a sequence. In this case, the apparent EPOs would tend to decrease 
with increasing sequence length, as the higher-EPO runs affect the loss at shorter lengths, 
but tail behavior is dominated by the slowest decay. Given the observations of the previous 
paragraph and the available statistics, we cannot usefully test for this possibility. 

Now we consider consistency between the measured EPOs and EPGs. We estimate the 
EPO that we should have measured given the EPGs obtained from the one-qubit and the 
phase gate benchmarks. For this estimate, we count the complexity of sequences of one- 
qubit pulses in terms of the number of effective ^-pulses applied. This counts only pulses 
around the ±x or ±y axes, taking into consideration that ^-axis pulses and identity gates 
are essentially error-free. The vr-pulses are counted as two |-pulses. Coherent error addition 
is neglected. The one-qubit benchmark's steps each have an average of 1.8 effective ^-pulses 
per qubit. If we use 0.0085 as a representative error probability per step from the one-qubit 
benchmark (Fig. |5]), we obtain ei(l) = (6/5) * 0.0085/1.8 = 0.0057 as the linearized error 
probability per one-qubit |-pulse. The factor of 6/5 converts the average probability of error 
for one qubit to that for two qubits under the assumptions that the other qubit has no error 
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and any Pauli error is twirled to a depolarizing error. For the purpose of this calculation, we 
take the error probability per phase gate to be 0.069. Each step in the Clifford benchmark 
has an average of 6.5 effective ^ pulses and 1.5 phase gates. The linearized error probability 
for a step can therefore be estimated as 1.5 * 0.069 + 6.5 * 0.0057 = 0.14, with a standard 
error of about 0.02, if we add the statistical errors in quadrature. This linear approximation 
is expected to give a pessimistic estimate, but in this case, the nonlinear correction is smaller 
than the error in the estimate. While our estimate gives a value below the measured EPO, 
the difference appears not to be statistically significant. We emphasize that the above 
strategy for estimating the EPO from EPGs neglects coherent error addition, which tends 
to increase the error, and internal error cancellation that could arise from the way pulses 
are combined within a step. 

VII. ESTIMATES OF PHYSICAL SOURCES OF ERROR 

We consider known sources of errors and estimate their contribution to the EPG of G. 
Spontaneous emission is a fundamental source of error for transitions driven by stimulated- 
Raman transitions; here the laser beams are tuned approximately 70 GHz below the P1/2 



state [35|, |36|. We simulate that for our laser parameters, this should contribute an error 
probability of 0.001 to a one-qubit vr-pulse and 0.013 to the phase gate. (Recall that the 
phase gate consists of an MS gate surrounded by 7r/2-pulses.) 

Errors can also arise from imperfect calibrations and slow drifts of the gate parameters. 
These parameters include beam intensities, frequencies, phases, and pulse durations. These 
types of errors are coherent in the sense that for any given run, each implemented gate still 
causes a unitary change in state, but not exactly the intended one. To determine whether 
such errors contribute significantly to the measured EPOs, we consider the variation in 
fidelities for different sequences of a given length. Coherent error contributions typically 
result in a variation that is larger than that expected from a simple statistical analysis |5|. 
For our experiment and in the absence of coherent errors, we attribute the largest sources 
of variation in the fidelities of Fig. H] to the varying number of G gates and single-qubit 
rotations needed to implement each step's random Clifford unitary, and to the binomial 
statistics for the fidelities inferred from the 100 runs of each sequence. Fig. [6] compares 
the actual variation and the variation predicted from the statistics of the number of phase 
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gates per step and the binomial statistics. We did not include the variation in numbers of 
one-qubit gates due to their significantly smaller error. The gate statistics and binomial 
statistics are independent, so their contributions were added in quadrature. The predicted 
variation is generally somewhat less than the measured one but does not indicate coherence 
of the errors in a given sequence because our simple model does not account for all incoherent 
effects. 

The main sources of coherent errors are due to drifts in beam intensity, relative laser field 
phases for the beams implementing the phase gate, and Stark-shifted frequencies. These 
drifts result in errors in pulse time, and phase and frequency calibration, each of which are 
estimated to contribute approximately equally to the EPG. We estimate that their total 
contribution to the phase gate EPG is less than 0.03. 

There are a number of lesser sources of error to consider. In addition to the slow intensity 
drifts included above, there are also fiuctuations in intensities that can be slow compared to 
sequence duration but are too fast to be calibrated out. For example, these can arise from 
fiuctuations in laser power or from noise in the position of the laser beams with respect to 
the ions, due to vibration and air movement. Such fiuctuations in intensity lead to loss of 
visibility in Rabi fiopping curves p, |23|. From such curves, we determined that the Rabi 
rate on the carrier transition with the non-copropagating beams fiuctuates by -^ = 0.029(1) 
from experiment to experiment. This results in a contribution of 2 x 10~^ [18| to the phase 
gate EPG. 

Due to the finite Lambe-Dicke parameter for the ions, fiuctuations in ion motional energy 



can cause errors in the MS gate (23|, |37| . As in Ref . [32| , we estimate that each motional 
mode is cooled to an average excitation of at most 0.2 quanta before the implementation of 
each phase gate. This leads to errors in the MS gate of 6 x 10~^ due to the finite excitation 
of the modes directly involved in the gate and 1 x 10^'^ due to the fiuctuating Debye- Waller 



factors of the other modes combined 
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37|. 



Intrinsic background heating for the ions results in motional decoherence in the MS gate 
while the spins and motion of the ions are entangled. We measure a heating rate for a single 
^Be"*" ion to be 0.3 to 0.5 quanta per millisecond in the common axial mode at a confinement 
frequency of 2.7 MHz. However, the motional modes used for the MS gate have only a small 
component of the center-of-mass motion. Conditions here are essentially the same as those 
of a previous experiment [20|, and imply a contribution of less than 10"^ to the phase gate's 
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FIG. 6. Scatter in the error for randomized benchmarking data. The open circles show 
one minus the experimentally measured average probabilities of error for the individual sequences 
of random Clifford unitaries as a function of sequence length for the data shown in red in Fig. HI 
The total numbers of sequences shown at sequence lengths 1,2,3,4,5,6 are 44,54,52,38,28,15, 
respectively. The average error is the percentage of times at least one of the qubits was not found 
in the expected state in 100 experiments. The red error bars to the left of the data at each length 
show the expected standard deviation if the error variation is due to variation in the number of 
phase gates needed to implement the random Clifford unitaries used in the sequences and the 
binomial statistics for the 100 runs for each sequence. The black error bars show the standard 
deviation of the set of fidelities measured for the corresponding length. The solid line is the fit to 



Eq.m 
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error. 

To perform the phase gate, the optical path lengths of the two non-copropagating beams 
should ideally remain constant throughout the gate's three-pulse sequence, whose duration is 
110 fis. Measurements of the optical beat-note between the non-copropagating Raman beams 
give a linewidth of order 10 Hz; from this we estimate that optical-path-length fluctuations 
result in a phase-gate error of the order of 10"'^, assuming the relative phase fluctuations at 
the beat-note detector are the same as those experienced by the ions. 

The qubit coherence time and thus the benchmark error probabilities are affected by 
fluctuations in the magnetic field and its gradients, which cause differential frequency shifts 
in the qubits. Experimentally we determine a qubit coherence time by measuring the decay 
of the contrast in a two-pulse Ramsey experiment as a function of the duration between 
pulses. We find a coherence time of 4.1 ± 1.7 s. Due to the resolution limit of our frequency 
synthesizer, which serves as a clock to keep track of the qubit phases during the randomized 
benchmarking experiments, we are systematically detuned from the qubit frequency by 
270 mHz. The error due to this frequency offset should be negligible, given the typical time 
required for a Clifford unitary of a few milliseconds. We measure a magnetic field difference 
between trap zones A and B of 1.5 x 10"^ T, which leads to a systematic frequency difference 
between the qubits of a few millihertz depending on the exact value of the magnetic-field. 

As an independent check on phase-gate fidelity, we measured the state fidelity for a Bell 
state created by use of the phase gate G, as was done in Refs. [M, ll5|, ll8[ . Such measurements 
were performed before and after the randomized benchmarking data was taken. Before the 
benchmark we determined a Bell state fidelity of 0.91(2). After the benchmark we obtained 
0.90(2). These fidelities include errors due to imperfect state initialization, detection, and 
three carrier |-pulses using the co-propagating beams that are needed to prepare and analyze 
the state. A measurement of the state fidelity where the Bell state was prepared using 
only the MS gate and analyzed with a single non-copropagating carrier |-pulse, thereby 
removing one non-copropagating carrier pulse and three co-propagating carrier pulses from 
the measurement, gave 0.94(1). The fidelities are consistent with the EPG determined by 
the benchmark. 

Errors for one-qubit gates implemented with copropagating beams are likely dominated 
by changes in the Rabi rate [5|. An indication of whether or not long-term drifts may have 
affected the two-qubit benchmark can be obtained by comparing two one-qubit benchmarks. 
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The first was run immediately after tlie two-qubit bencfimark, witfiout recalibrating. The 
second followed recalibration of the relevant pulses and is the one that we reported above. 
The first one found EPGs 0.009(2) and 0.012(3) for the two qubits, respectively, suggesting 
that at least the second qubit's gates may have been in need of recalibration by the end of 
the two-qubit benchmarks. 

Error can also be caused by loss of ions due to background gas collisions. We checked 
for loss of ions after each sequence, and if an ion-loss event was detected, we removed the 
previous sequence from the data set. We observed a significant increase in the rate of ion loss 
events for sequences involving more than 16 ion separation/recombination processes — one 
such process is needed for each phase gate performed. This limited the maximum length 
of the sequences used in the randomized benchmarking. The reason for the increase in the 
rate of ion loss is not understood. 

In summary, the errors discussed in the previous paragraphs amount to a phase gate 
EPG of about 0.048 (linearized, incoherent error addition) to be compared to the benchmark- 
determined EPG of 0.069±0.017. The EPG estimate of 0.048 includes 0.013 for spontaneous 
emission, 0.03 for calibration imperfections, 0.002 for intensity fluctuations, 0.0016 for ion 
motion, 0.0005 for motional heating, and 0.001 for optical path length fluctuations. The 
fact that our measured error is greater than our estimated error based on known physical 
sources suggests that our model of errors for the phase gate is incomplete. 

VIII. MULTI-QUBIT RANDOMIZED BENCHMARKING 

Clifford benchmarks as defined above can serve as a platform-independent strategy for 
comparing the quality of quantum operations in a computational context. For n qubits, the 
expressions for the EPOs and EPGs of Eq. ([1]) and (jl]) are generalized as follows: 

2" — 1 
^(0 = -1^ (1 - (1 - 2%^/(2" - 1))(1 - 2%/(2" - 1))') , (5) 

= '^1^ A _ l-2"e;/(2"-l) \ 
^^ 2" V l-2"e5/(2"-l); ' ^' 

where G is a gate being characterized by insertion after each step. These equations can be es- 
tablished under idealizing assumptions in the same way as Eqs. [Hand HI after observing that 
for n qubits, the probability that the sequence does not depolarize is 1 — i^—^Eil). The com- 
parison on the basis of EPOs and EPGs obtained from the length-dependent loss of fidelity 
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makes sense in the absence of significant deviations from the simple exponential-decay model. 
The thus-measured EPGs can be meaningfully interpreted as true average gate errors in the 
idealized case where the errors are independent, depolarizing and stationary. In general, the 
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connection to the actual error-behavior of elementary gates is not well understood [9| . Nev- 
ertheless, we believe that sufficiently small randomized-benchmark-determined EPGs are a 
good indication of gate quality that can be compared to a general-purpose fault-tolerance 
threshold goal such as the often-mentioned 10~^. In the above, we have considered how to 
take into account the indications for non-exponential decay in our data. Generally, if there 
is clear evidence of non-exponential decay, the behavior and range of observed EPOs and 
EPGs and the extent to which stationary behavior was achieved need to be discussed. Given 
sufficiently many sequence lengths, these ranges can be determined by considering different- 
length intervals. Initial transients in error behavior may be expected even in the case where 
stationary behavior is achieved for longer sequences and can be analyzed separately. 

The translation of a given Clifford unitary into a circuit of elementary gates suitable for 
a given platform is up to the experimenter, so some improvements are possible by greater 
efficiency of the translation rather than higher quality gates. We consider such "software" 
improvements to be potentially as useful as strictly "hardware" -based ones. Furthermore 
they are usually easier for others to implement. However, we believe that for small numbers 
of qubits, such circuit translations are already sufficiently close to optimal for software 
improvements of this sort to be self-limiting. The individual gate benchmarks implemented 
by inserting specific gates after the Clifford unitaries can show directly how much the gate 
quality has improved, independent of how the Clifford unitaries have been translated into 
circuits of elementary gates. 

When benchmarking n qubits, we suggest that the benchmarks are applied to different 
subsets of the qubits so that comparable EPOs are obtained for ra = 1, 2, 3, . . . qubits. We 
recommend that such benchmarks be applied in parallel to disjoint subsets, if possible. This 
solves the problem of comparing new results to earlier ones involving platforms with fewer 
qubits. Nevertheless, it would be helpful to have a way of comparing EPOs for the Clifford 
benchmark that is independent of the number of qubits. One possibility is to divide the 
EPO by C{n), the average over Clifford unitaries of the minimum number of controUed- 
not gates needed to implement them with a circuit consisting of controlled-not gates and 
arbitrary one-qubit Clifford gates. For two qubits, this normalization factor is determined 
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by C(2) = 1.5, so the normalized Clifford EPO for our benchmark is 0.108(5). For three 
qubits we determined C(3) = 3.51 rounded at the last digit. Since the number of Clifford 
unitaries grows very rapidly with n, it may be difficult to determine the normalization 



factor exactly for n > 4. It is known that C{n) scales as ?7,^/log(?T,) 27|, |28|, |38|. While 
this complexity may seem relatively large at first, any viable platform must be able to 
implement circuits of this size, if not much larger. In particular, any successful demonstration 
of the Clifford benchmark also establishes the ability to implement non-trivial circuits for 
algorithmic purposes. 

If the primary purpose of the experiment is to benchmark individual gates by inserting 
them after randomized unitaries, it is desirable to find ways to achieve sufficient random- 
ization that are more efficient than random Clifford unitaries. In particular, to exhibit an 
EPG of a gate in this way, it suffices for the random unitaries to approximate a so-called 
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unitary 2-design as explained in [l|]. Such approximations are possible with circuits involving 
a logarithmic multiple of n two-qubit gates [l|. How to best translate these theoretical ideas 
into a practical benchmark remains to be determined. 

For the Clifford benchmark, the strategy for choosing the final unitary Ci of a sequence 
given above is more constrained than necessary for ensuring that the measurement outcome 
is deterministic in the absence of error. This may result in less effective error depolarization. 
We suggest two alternatives that greatly reduce the constraints on Ci. The first is to choose 
Ci uniformly at random from all Clifford unitaries that ensure that the final state in the 
absence of error is a logical state. This is equivalent to constructing Ci as an implementation 
of the inverse of the previous unitaries followed by a random gate that can be decomposed 
into CNOT and Pauliproduct operators. An even more randomizing approach is to choose Ci 
as suggested in Ref. J5[. In this case, Ci is composed of a uniformly random Clifford unitary 
followed by one-qubit Clifford gates randomly chosen to ensure that a randomly chosen 
joint Z-measurement is deterministic in the absence of errors. By a joint Z-measurement 
we mean measurement of a product of a^ operators on a subset of the qubits. The product's 
eigenvalues are ±1 and can be determined by multiplying the standard basis measurement 
outcomes of the qubits in the subset, where a qubit's and 1 measurement outcomes are 
mapped to 1, —1, respectively. This strategy takes advantage of the often much better fidelity 
of one-qubit gates for the sequence-dependent part of the last unitary. A disadvantage is 
that instead of n deterministic bit values, only one bit value is obtained in each run of the 
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experiment. To fit the resulting data, we use Eqs. |5] and [6] with n = 1. 

IX. CONCLUSION 

In summary, we have described a protocol for randomized benchmarking of gates in a 
quantum information processor and implemented the protocol experimentally on two qubits 
to measure the error per operation of arbitrary two-qubit Clifford unitaries. The protocol 
we propose is independent of the gate set that is experimentally implemented and so can 
provide an easily portable method for evaluating the performance of Clifford unitaries on 
different physical platforms. Furthermore, with this method it is straightforward to isolate 
the fidelity of a specific two-qubit gate. We have emphasized some of the consistency checks 
that can be performed to qualify the reported errors per operation or gate. Looking ahead, 
this randomized benchmarking protocol should prove useful as different experimental im- 
plementations of quantum information processors aim to increase the number of qubits and 
work to decrease the errors towards what is required for fault-tolerance. 
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